AITopics | similarity information

Collaborating Authors

similarity information

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification

Zhao, Xiang, Li, Ruijie, Ning, Qiao, Guo, Shikai, Li, Hui, Ma, Qian

arXiv.org Artificial IntelligenceJul-22-2025

The identification of drug-target interactions (DTI) is critical for drug discovery and repositioning, as it reveals potential therapeutic uses of existing drugs, accelerating development and reducing costs. However, most existing models focus only on direct similarity in homogeneous graphs, failing to exploit the rich similarity in heterogeneous graphs. To address this gap, inspired by real-world social interaction behaviors, we propose SOC-DGL, which comprises two specialized modules: the Affinity-Driven Graph Learning (ADGL) module, learning global similarity through an affinity-enhanced drug-target graph, and the Equilibrium-Driven Graph Learning (EDGL) module, capturing higher-order similarity by amplifying the influence of even-hop neighbors using an even-polynomial graph filter based on balance theory. This dual approach enables SOC-DGL to effectively capture similarity information across multiple interaction scales within affinity and association matrices. To address the issue of imbalance in DTI datasets, we propose an adjustable imbalance loss function that adjusts the weight of negative samples by the parameter. Extensive experiments on four benchmark datasets demonstrate that SOC-DGL consistently outperforms existing state-of-the-art methods across both balanced and imbalanced scenarios. Moreover, SOC-DGL successfully predicts the top 9 drugs known to bind ABL1, and further analyzed the 10th drug, which has not been experimentally confirmed to interact with ABL1, providing supporting evidence for its potential binding.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.01405

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

SaGIF: Improving Individual Fairness in Graph Neural Networks via Similarity Encoding

Zhu, Yuchang, Li, Jintang, Zhang, Huizhe, Chen, Liang, Zheng, Zibin

arXiv.org Artificial IntelligenceJun-24-2025

Individual fairness (IF) in graph neural networks (GNNs), which emphasizes the need for similar individuals should receive similar outcomes from GNNs, has been a critical issue. Despite its importance, research in this area has been largely unexplored in terms of (1) a clear understanding of what induces individual unfairness in GNNs and (2) a comprehensive consideration of identifying similar individuals. To bridge these gaps, we conduct a preliminary analysis to explore the underlying reason for individual unfairness and observe correlations between IF and similarity consistency, a concept introduced to evaluate the discrepancy in identifying similar individuals based on graph structure versus node features. Inspired by our observations, we introduce two metrics to assess individual similarity from two distinct perspectives: topology fusion and feature fusion. Building upon these metrics, we propose Similarity-aware GNNs for Individual Fairness, named SaGIF. The key insight behind SaGIF is the integration of individual similarities by independently learning similarity representations, leading to an improvement of IF in GNNs. Our experiments on several real-world datasets validate the effectiveness of our proposed metrics and SaGIF. Specifically, SaGIF consistently outperforms state-of-the-art IF methods while maintaining utility performance. Code is available at: https://github.com/ZzoomD/SaGIF.

artificial intelligence, machine learning, similarity, (17 more...)

arXiv.org Artificial Intelligence

2506.18696

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

Kosolwattana, Tanapol, Wang, Huazheng, Lin, Ying

arXiv.org Artificial IntelligenceOct-21-2023

Adaptive monitoring of a large population of dynamic processes is critical for the timely detection of abnormal events under limited resources in many healthcare and engineering systems. Examples include the risk-based disease screening and condition-based process monitoring. However, existing adaptive monitoring models either ignore the dependency among processes or overlook the uncertainty in process modeling. To design an optimal monitoring strategy that accurately monitors the processes with poor health conditions and actively collects information for uncertainty reduction, a novel online collaborative learning method is proposed in this study. The proposed method designs a collaborative learning-based upper confidence bound (CL-UCB) algorithm to optimally balance the exploitation and exploration of dependent processes under limited resources. Efficiency of the proposed method is demonstrated through theoretical analysis, simulation studies and an empirical study of adaptive cognitive monitoring in Alzheimer's disease.

algorithm, dependent dynamic process, online modeling and monitoring, (10 more...)

arXiv.org Artificial Intelligence

2307.14208

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Oregon (0.04)
Asia > Middle East > Iran (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables

Sheng, Taoran, Huber, Manfred

arXiv.org Artificial IntelligenceAug-6-2023

Sensor data streams from wearable devices and smart environments are widely studied in areas like human activity recognition (HAR), person identification, or health monitoring. However, most of the previous works in activity and sensor stream analysis have been focusing on one aspect of the data, e.g. only recognizing the type of the activity or only identifying the person who performed the activity. We instead propose an approach that uses a weakly supervised multi-output siamese network that learns to map the data into multiple representation spaces, where each representation space focuses on one aspect of the data. The representation vectors of the data samples are positioned in the space such that the data with the same semantic meaning in that aspect are closely located to each other. Therefore, as demonstrated with a set of experiments, the trained model can provide metrics for clustering data based on multiple aspects, allowing it to address multiple tasks simultaneously and even to outperform single task supervised methods in many situations. In addition, further experiments are presented that in more detail analyze the effect of the architecture and of using multiple tasks within this framework, that investigate the scalability of the model to include additional tasks, and that demonstrate the ability of the framework to combine data for which only partial relationship information with respect to the target tasks is available.

information, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3397330

2308.03805

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (0.66)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Autonomous Drug Design with Multi-Armed Bandits

Svensson, Hampus Gummesson, Bjerrum, Esben Jannik, Tyrchan, Christian, Engkvist, Ola, Chehreghani, Morteza Haghir

arXiv.org Artificial IntelligenceJan-20-2023

Recent developments in artificial intelligence and automation support a new drug design paradigm: autonomous drug design. Under this paradigm, generative models can provide suggestions on thousands of molecules with specific properties, and automated laboratories can potentially make, test and analyze molecules with minimal human supervision. However, since still only a limited number of molecules can be synthesized and tested, an obvious challenge is how to efficiently select among provided suggestions in a closed-loop system. We formulate this task as a stochastic multi-armed bandit problem with multiple plays, volatile arms and similarity information. To solve this task, we adapt previous work on multi-armed bandits to this setting, and compare our solution with random sampling, greedy selection and decaying-epsilon-greedy selection strategies. According to our simulation results, our approach has the potential to perform better exploration and exploitation of the chemical space for autonomous drug design.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2207.01393

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Multi-view learning with privileged weighted twin support vector machine

Xu, Ruxin, Wang, Huiru

arXiv.org Machine LearningJan-26-2022

Weighted twin support vector machines (WLTSVM) mines as much potential similarity information in samples as possible to improve the common short-coming of non-parallel plane classifiers. Compared with twin support vector machines (TWSVM), it reduces the time complexity by deleting the superfluous constraints using the inter-class K-Nearest Neighbor (KNN). Multi-view learning (MVL) is a newly developing direction of machine learning, which focuses on learning acquiring information from the data indicated by multiple feature sets. In this paper, we propose multi-view learning with privileged weighted twin support vector machines (MPWTSVM). It not only inherits the advantages of WLTSVM but also has its characteristics. Firstly, it enhances generalization ability by mining intra-class information from the same perspective. Secondly, it reduces the redundancy constraints with the help of inter-class information, thus improving the running speed. Most importantly, it can follow both the consensus and the complementarity principle simultaneously as a multi-view classification model. The consensus principle is realized by minimizing the coupling items of the two views in the original objective function. The complementary principle is achieved by establishing privileged information paradigms and MVL. A standard quadratic programming solver is used to solve the problem. Compared with multi-view classification models such as SVM-2K, MVTSVM, MCPK, and PSVM-2V, our model has better accuracy and classification efficiency. Experimental results on 45 binary data sets prove the effectiveness of our method.

algorithm, information, mpwtsvm, (13 more...)

arXiv.org Machine Learning

2201.11306

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Controlling the Quality of Distillation in Response-Based Network Compression

Vats, Vibhas, Crandall, David

arXiv.org Artificial IntelligenceDec-18-2021

The performance of a distillation-based compressed network is governed by the quality of distillation. The reason for the suboptimal distillation of a large network (teacher) to a smaller network (student) is largely attributed to the gap in the learning capacities of given teacher-student pair. While it is hard to distill all the knowledge of a teacher, the quality of distillation can be controlled to a large extent to achieve better performance. Our experiments show that the quality of distillation is largely governed by the quality of teacher's response, which in turn is heavily affected by the presence of similarity information in its response. A well-trained large capacity teacher loses similarity information between classes in the process of learning fine-grained discriminative properties for classification. The absence of similarity information causes the distillation process to be reduced from one example-many class learning to one example-one class learning, thereby throttling the flow of diverse knowledge from the teacher. With the implicit assumption that only the instilled knowledge can be distilled, instead of focusing only on the knowledge distilling process, we scrutinize the knowledge inculcation process. We argue that for a given teacher-student pair, the quality of distillation can be improved by finding the sweet spot between batch size and number of epochs while training the teacher. We discuss the steps to find this sweet spot for better distillation. We also propose the distillation hypothesis to differentiate the behavior of the distillation process between knowledge distillation and regularization effect. We conduct all our experiments on three different datasets.

batch size, distillation, similarity information, (11 more...)

arXiv.org Artificial Intelligence

2112.10047

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report (0.51)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Exploiting Class Similarity for Machine Learning with Confidence Labels and Projective Loss Functions

Gare, Gautam Rajendrakumar, Galeotti, John Michael

arXiv.org Artificial IntelligenceMar-25-2021

Class labels used for machine learning are relatable to each other, with certain class labels being more similar to each other than others (e.g. images of cats and dogs are more similar to each other than those of cats and cars). Such similarity among classes is often the cause of poor model performance due to the models confusing between them. Current labeling techniques fail to explicitly capture such similarity information. In this paper, we instead exploit the similarity between classes by capturing the similarity information with our novel confidence labels. Confidence labels are probabilistic labels denoting the likelihood of similarity, or confusability, between the classes. Often even after models are trained to differentiate between classes in the feature space, the similar classes' latent space still remains clustered. We view this type of clustering as valuable information and exploit it with our novel projective loss functions. Our projective loss functions are designed to work with confidence labels with an ability to relax the loss penalty for errors that confuse similar classes. We use our approach to train neural networks with noisy labels, as we believe noisy labels are partly a result of confusability arising from class similarity. We show improved performance compared to the use of standard loss functions. We conduct a detailed analysis using the CIFAR-10 dataset and show our proposed methods' applicability to larger datasets, such as ImageNet and Food-101N.

confidence label, loss function, similarity, (16 more...)

arXiv.org Artificial Intelligence

2103.13607

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Structure Learning with Similarity Preserving

Kang, Zhao, Lu, Xiao, Lu, Yiwei, Peng, Chong, Xu, Zenglin

arXiv.org Machine LearningDec-3-2019

Leveraging on the underlying low-dimensional structure of data, low-rank and sparse modeling approaches have achieved great success in a wide range of applications. However, in many applications the data can display structures beyond simply being low-rank or sparse. Fully extracting and exploiting hidden structure information in the data is always desirable and favorable. To reveal more underlying effective manifold structure, in this paper, we explicitly model the data relation. Specifically, we propose a structure learning framework that retains the pairwise similarities between the data points. Rather than just trying to reconstruct the original data based on self-expression, we also manage to reconstruct the kernel matrix, which functions as similarity preserving. Consequently, this technique is particularly suitable for the class of learning problems that are sensitive to sample similarity, e.g., clustering and semisupervised classification. To take advantage of representation power of deep neural network, a deep auto-encoder architecture is further designed to implement our model. Extensive experiments on benchmark data sets demonstrate that our proposed framework can consistently and significantly improve performance on both evaluation tasks. We conclude that the quality of structure learning can be enhanced if similarity information is incorporated. Introduction With the advancements in information technology, high-dimensional data become very common for representing the data. However, it is difficult to deal Preprint submitted to Elsevier December 4, 2019 arXiv:1912.01197v1 Fortunately, in practice data are not unstructured.

artificial intelligence, machine learning, similarity, (19 more...)

arXiv.org Machine Learning

1912.01197

Country:

Asia > China > Shandong Province > Qingdao (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Practical Federated Gradient Boosting Decision Trees

Li, Qinbin, Wen, Zeyi, He, Bingsheng

arXiv.org Machine LearningNov-11-2019

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each owner, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

gradient, hash value, simfl, (15 more...)

arXiv.org Machine Learning

1911.04206

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Western Australia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.62)

Add feedback